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' Abstract. An algorithm is developed which the goal of producing the most statistically significant 

Q^l signature list for distinguishing between two candidate models given a set of LHC observations. 
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c^ : 

Recently in yj] we investigated the LHC inverse problem first rigorously posed by 
Arkani-Hamed et al. in [0] and showed that non-collider data can be used to remove the 
Ph' degeneracies observed in the collider data. In [3] we attacked the same problem in the 

p • context of determining the non-universality in the gaugino sector by using LHC signa- 

ls \ tures. In this short note we summarize the statistical methods we utilized to optimize the 

set of signatures in order to minimize the integrated luminosity required to resolve the 
degeneracies. 

^ ! We define a chi- square like distance function between any two models A and B as the 

' metric of the signature space which is very similar to the one used in [2J as 

o ; 

0\ \ where 5/ is the counting signature and 5Sf^ is the uncertainty of the numerator, i.e. 

\ the difference between the signatures which we will assume to contain only statistical 

_ ^ ■ errors. We can identify any signature Si with an "effective" cross section d/ = 5//L which 

^ ! includes the geometric cuts that are performed on the data, the detector efficiencies, etc. 

^ \ At large integrated luminosity this converges to an "exact" cross section a, = lim^^oo d/. 

■ - - ' Rewriting the metric in terms of these effective cross sections gives us 
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where and are the integrated luminosities that are used to compute the effective 
cross sections. 

We can obtain the statistical properties of this metric by replacing each signature (or 
effective cross section) by a random variable following a normal distribution. After this 
randomization, the effective cross sections simply become 



ai = Sf/LA = of + ^af/LAZA. 
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with a similar expression for the model B. 
Substituting Q into © simply gives 
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where Z,, Z- and Zf are independent normally distributed random variables and assuming 
all Zf are independent, i.e. our n signatures are independent from each other, (ASab)^ is 
itself a random variable having a non central chi-square distribution 



P{AS^)=nxl^{nAS^), 
where A is the non-centrality parameter which is given by 



\0f/LA + 0f/LB 
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Here, A = (7^ 0) corresponds to comparing a model to itself (to a different model) 
by using two sets of independent measurements. Figure [T] shows how the (ASab)^ 
distribution favors larger values as X increases. Since our goal is to tell apart two models, 
we want the possible (ASab)^ values we will get from this comparison to be safely away 
from the possible values we get by comparing a model to itself, i.e. A = case. If we 



TABLE 1. List of 

■^min('^:/') values 
for various values of the parameters n 
and p. 



Confidence Level p 



n 


0.95 


0.975 


0.99 


0.999 


1 


12.99 


17.65 


24.03 


40.71 


2 


15.44 


20.55 


27.41 


44.99 


3 


17.17 


22.60 


29.83 


48.10 


4 


18.57 


24.27 


31.79 


50.66 


5 


19.78 


25.71 


33.50 


52.88 


6 


20.86 


26.99 


35.02 


54.88 


7 


21.84 


28.16 


36.41 


56.71 


8 


22.74 


29.25 


37.69 


58.40 


9 


23.59 


30.26 


38.89 


59.99 


10 


24.39 


31.21 


40.02 


61.48 



quantify this safety condition as the requirement that (100 x p)% of the distributions do 
not overlap, i.e. (100 x p)% of the values we get by comparing the same model to itself 
are less than (100 x p)% of the values we get by comparing two different models, we 
obtain the following equations 

/•oo 



y 

which can be solved numerically to compute a Amin value (see Table [I]) for every number 
of signatures n and the non-overlap fraction (or confidence level) p. Here 7 is the (A5)^ 
cut-off value for which ( 100 x p)% of the values we get by comparing a model to itself is 
less than this value and this condition gives us Eqn (7) which can be solved numerically 
to compute 7. Then this 7 value is used as the lower cut-off for the next equation which 
is solved again numerically to compute Amin- 

The condition for two models to be distinguishable is simply A > Amin- In this 
inequality is just a numerically computed number which is independent of the 
physics involved in the collider experiment and all the physics is in X which is a function 
of cross sections given by each signature. 

Let us assume now that "model A" is the experimental data, which corresponds 
to an integrated luminosity of L^^"^, and "model 5" is the simulation with integrated 
luminosity L^™ = qLF^^. We might imagine that q can be arbitrarily large, limited only 
by computational resources. Let us make one final notational definition 

N /'^exp _ sim^2 
^=E exp f' ■ ^ (9) 



then we can compute the minimum amount of luminosity required for two models to be 
distinguishable which is given by 

t„,. = ^=^. (10) 

If the two models we want to compare are very similar in all the channels (signatures) 
we consider, then R will be small and Lmin will be large. If on the other hand the models 
are very different R will be large and Lmin will be small. This is of course what we 
expect, i.e. similar models require more integrated luminosity to distinguish. 

Now the question is how to make L^in as small as possible. We see from Table[i|that 
Aniin increases as n increases and since Ris a sum of positive quantities it increases with 
n as well. Therefore using more signatures does not necessarily help in distinguishing 
models and, moreover, the signature space is not big enough (or at least the relevant part 
of the signature space, see [2]) to allow multiple independent directions. It is easy to see 
the orthogonality of signatures such as number of events with 1 lepton and 2 leptons, but 
for more general cases, such as kinematic histograms which we can integrate between 
limits that are also optimized to increase distinguishability, we need to compute the 
correlation coefficient between different signatures a and b which is given by 

p^^^^^^^^m=^M=^i== for larger, (11) 



where the represent the individual results obtained from each of the A'^ cross section 
measurements, labeled by the index k. This correlation matrix pab then can be used to 
determine the compatible observables, i.e. the ones which are not correlated with each 
other with more than some fixed threshold e. This gives us the adjacency matrix of a 
graph which we define as 

\ if \pab\ >£■ 

Now finding the compatible observables is equivalent to finding all the complete sub- 
graphs (or 'clique') of that graph which is a well known problem in graph theory. All 
these complete subgraphs give us an L^i^ value and obviously the one giving the mini- 
mum of all these graphs contains the list of the signatures we want to combine together. 
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